Video Compression with Wavelet Transform Using SPIHT Algorithm in MATLAB Simulink

 

Chhabikiran Sao and Amit Kolhe

Rungata College of Engineering and Technology, Bhilai, CG, India

*Corresponding Author E-mail: chhabikiran.sao@gmail.com

 

ABSTRACT:

Motion estimation is a common ingredient in many state-of the-art video processing algorithms, serving as an effective way to capture the spatial-temporal correlation in video signals. However, the robustness of motion estimation often suffers from problems such as ambiguities of motion trajectory (i.e. the aperture problem) and illumination variances. In this paper, we explore a new framework for video processing based on the recently proposed wavelet transform with SPIHT algorithm. Instead of containing an explicit motion estimation step, the wavelet transform provides motion-selective sub band decomposition for video signals. We demonstrate the potential of this new technique in a video denoising application.

 

KEYWORDS: Motion estimation, Spatial, temporal, trajectory, decomposition, denoising.

 

 


1. INTRODUCTION:

Three-dimensional (3-D) video compression using wavelets decomposition along the temporal axis dictates that a number of video frames must be buffered to allow for the temporal decomposition. Buffering of frames allows for the temporal correlation to be made use of, and the larger the buffer the more effective the decomposition1. In many applications, such as the design of multimedia workstations and high quality audio transmission and storage, the goal is to achieve transparent coding of audio and speech signals at the lowest possible data rates. In other words, bandwidth cost money, therefore, the transmission and storage of information becomes costly Further reduction in bit rate is an attractive proposition in applications like remote broadcast lines, studio links, satellite transmission of high quality audio and voice over internet2.Wavelet transform decomposes a video frame into a set of subframes with different resolutions corresponding to different frequency bands. These multiresolution frames also provide a representation of the global motion structure of the video signal at different scales. The motion activities for a particular subframe at different resolutions are different but highly correlated since they actually specify the same motion structure at different scales4 10.

New method for partitioning the wavelet coefficients into  spatio-temporal (s-t) blocks to get higher error resilience and to support error concealment. Instead of grouping adjacent coefficients, we group coefficients in some fixed intervals in the lowest subband to get interleaved trees. All of the sub-blocks correspond to a group of full frames within the image sequence, because each group of interleaved trees has coefficients dispersed over the entire frame group. Then we separate the stream into fixed length packets and encode every one with a channel code. Experiments show that this proposed method brings higher error resilience in noisy channels since the decoded coefficients associated with early decoding error would be spread out to the whole area along with the sequence, and we can conceal the lost coefficients with the surrounding coefficients even if some of sub streams are totally missing.7-9.

 

Motion estimation is an effective way to capture the spatial temporal correlation in video signals, and is widely used in many state-of-the-art video processing algorithms. Its importance has been confirmed by the ubiquitous existence of motion estimation / compensation steps in the latest series of video coding standards, from MPEG-1 to H.264. However, the robustness of motion estimation often suffers from problems such as ambiguities of motion trajectory (i.e. the aperture problem), illumination variances, and noise in the video sequences. Meanwhile, for practical applications, there is always a trade-off between the accuracy of the motion vectors and computational complexity.

In this paper, we explore a new framework for video processing based on the recently proposed wavelet transform with SPIHT algorithm. Instead of containing an explicit motion estimation step, wavelet transform provides motion-selective sub band decomposition for video signals. Similar approaches to video processing using various motion-selective 3-D transforms have been previously proposed by several researchers. A potential advantage of the transform is that its directional resolution can be refined by invoking more levels of decomposition. In practice, we usually choose to have 192 or more directional sub bands at finer scales of the wavelet with SPIHT. This work was supported by the US National Science Foundation under Grant CCR-0237633 (CAREER) transform, which can potentially lead to a more accurate separation of different motion information in the video signals. This paper is organized as follows. . We show in Section 3 why the wavelet with SPIHT transform can potentially provide an efficient representation for video signals without explicit motion estimations. We present experimental results on video denoising in Section 4 and conclude the paper in Section 5.

 

2. WAVELET BASED VIDEO CODER:

The system block diagram (figure-1) is when encoder gets an input frame; our scene change detector will tell encoder. The current input frame should be compressed either as independent frame (key frame) or delta frame except the first frame that has to be compressed as key frame. We need to keep the reference frames the same at both the encoder and the decoder. This can be achieved by adding a feedback loop in the decoder, such that the decoded reference frames at both the encoder and decoder are locked to the same data rate -- the lowest data rate. We assume that the target data rate R is within the range RL ≤ R ≤RH, and the bits required to encode the motion vector fields have data rate RMV, where R MV < RL. At the encoder, since RMV, is known, the embedded bit stream can always be decoded at rate RL -RMV, which is then added to the predicted frame to generate the

 

reference frame Pref. At the decoder, the embedded bit stream is decoded at two data rates, the targeted data rate R- RMV, and the fixed data rate RL - RMV,. The frame decoded at rate R, - RMV, is add to the predicted frame to generate the reference frame, which is exactly the same as the reference frame Pref used in the encoder. The frame decoded at rate R - RMV, is added to the predicted frame to generate the final decoded frame. This way, the reference frames at the encoder and the decoder are identical, which leaves the decoded PEF Pref as the only source of distortion. Hence, error propagation is eliminated.

 

3. IMPRORTANT STEPS OF ALGORITHM:

3.1 Scene Change Detector: –

The first step of video coder is to detect weather the frame to be coded as the key frame or delta frame. In practice after every scene change the next frame is taken as a key frame, as the first frame of the scene contain more information as compared to next frames. Key frame also acts as a reference for the next delta frame at the time of decoding. To detect the frame as key or delta, various schemes are used as –

1) Sum of absolute difference technique (SAD):- In SAD based technique the sum of absolute difference between consecutive frames is used as a measure for the frame detection. If the value of SAD is above user defined value then the  frame is taken as the key frame else it is consider as the delta frame.

 

2) Variance based technique: - In variance based technique, the variance between consecutive frames is taken as the measure of the key or delta frame.

 

3) Mean square difference based technique (MSD) :- In MSD based technique the mean square difference between consecutive frames is used as a measure for the frame detection. If the value of MSD is above user defined value then the  frame is taken as the key frame else it is consider as the delta frame.

 


 

Figure 1. Typical Video Coder


3.2. Block:

The temporal prediction technique used in video compression is known as motion estimation. The basic premise of motion estimation is that in most cases, consecutive video frames will be similar except for changes induced by objects moving within the frames. In the trivial case of zero motion between frames (and no other differences caused by noise, etc.), it is easy for the encoder to efficiently predict the current frame as a duplicate of the prediction frame. When this is done, the only information necessary to transmit to the decoder becomes the syntactic overhead necessary to reconstruct the picture from the original reference frame. When there is motion in the images, the situation is not as simple.

 

Figure 2.  Video Frames with motion

 

Figure.2 shows an example of a frame with 2 stick figures and a tree. The second half of this figure is an example of a possible next frame, where panning has resulted in the tree moving down and to the right, and the figures have moved farther to the right because of their own movement outside of the panning. The problem for motion estimation to solve is how to adequately represent the changes, or differences, between these two video frames. The way that motion estimation goes about solving this problem is that a comprehensive 2-dimensional spatial search is performed for each defined size block. This search technique is based on correlation or MSD between blocks. It is well known that a full, exhaustive search over a wide 2-dimensional area yields the best matching results in most cases, but this performance comes at an extreme computational cost to the encoder. As motion estimation usually is the most computationally expensive portion of the video encoder, some lower cost encoders might choose to limit the pixel search range, or use other techniques such as telescopic searches, usually at some cost to the video quality.

 

Figure 3. Block Matching

 

As can be seen, the top frame has a bad match with the macro block to be coded. The middle frame has a fair match, as there is some commonality between the 2 macro blocks. The bottom frame has the best match, with only a slight error between the 2 macro blocks. Because a relatively good match has been found, the encoder assigns motion vectors to the macro block, which indicate how far horizontally and vertically the macro block must be moved so that a match is made. As such, each forward and backward predicted macro block may contain 2 motion vectors, so true bidirectionally predicted macro blocks will utilize 4 motion vectors.

 

 

Figure.4. Block based prediction

 

In the figure-4, the predicted frame is subtracted from the desired frame, leaving a (hopefully) less complicated residual error frame that can then be encoded much more efficiently than before motion estimation. It can be seen that the more accurate the motion is estimated and matched, the more likely it will be that the residual error will approach zero, and the coding efficiency will be highest. Further coding efficiency is accomplished by taking advantage of the fact that motion vectors tend to be highly correlated between macro blocks. Because of this, the horizontal component is compared to the previously valid horizontal motion vector and only the difference is coded. This same difference is calculated for the vertical component before coding. These difference codes are then described with a variable length code for maximum compression efficiency.

 

3.3 Coding and Decoding:

If the current frame is key frame then we decomposed the frame using the 2D wavelet transform. We use standard 9/7 filter bank for wavelet decomposition of the key frame. Then we apply the SPIHT coding algorithm to code the decomposed key frame and convert it in to the bit stream and send to the decoder. We also keep this key frame in the buffer to calculate the prediction error frame for next frame. If the current frame is the delta frame then we first calculate the motion vector using the delta frame and the key frame stored in the buffer. We transmit these motion vectors to the decoder and also use theses motion vectors to compensate the key frame according to the current delta frame. We then calculated the difference between the delta frame and the motion compensated key frame to generate the prediction error frame (PEF). We code this PEF in the same manner as that of key frame. It is to be noted that we assign 1/3 rd amount of bits to PEF as compared to the key frame as it contain very less information.

In decoding part we use inverse steps as that of a coding. At the receiver end we first receive the bit stream of the key frame. We reconstruct this key frame using the inverse SPIHT and then applying inverse DWT. Then we receive the motion vectors, we use these motion vectors to adjust the first frame. At last we receive the bit stream of PEF , we add this decoded PEF in the motion adjusted first frame to generate the second frame. We keep on doing the same procedure till next key frame comes.

 

4. RESULTS:

In simulation we have consider standard video sequences (Foreman.YUV, Akio.YUV etc). The videos we consider has few scene change, so we consider every fifth frame as a key frame for the coding. To check the compression performance we use the compression gain formula as given bellow-

 

<<<< 

CR – Compression ration.

N - Frame size.

FN- Total number of frames.

KF - Total number of key frames.

DFN - Total number of delta frames.

Rt - Number of bits per pixel in original video.

Rt1 - Number of bits per pixel assign for key frame in compressed video.

Rt2 - Number of bits per pixel assign for delta frame in compressed video.

It is to be noted that in our simulation we use  Rt2 = 0.3 Rt1

 

 

Figure 5.1. Original Key frame       Figure 5.2. Compressed Key                                                                       frame (Key Frame BPP-3  and                                                                 Delta Frame BPP-1)

  

Figure 5.3. Compressed Key              Figure 5.4. (Key Frame BPP-0.9

frame                                                    and Delta Key 

Frame BPP-0.3)                                  Frame BPP-1.8 and Delta Frame                   BPP-0.6)

 

Figure 5.5. Original Delta                  Figure 5.6. Compressed Delta

frame                                                    frame (Key Frame BPP-3 and   Delta                                                      Frame BPP-1)

 

Figure 5.7. Compressed Delta          Figure 5.8. Compressed Delta

Frame (Key BPP-1.8 and                  frame (Key Frame Frame BPP-0.9

Delta Frame BPP-0.6)                        and Delta Frame BPP-0.3)

 

We can see table.1.below which shows how the compression ratio varies according to frame rate and delta or non key frame rate.

 

 


Table.1. Compression ratio with respect to change in key frame rate and non key frame rate

Key Frame Rate

0.6

1.2

2

3

6

Delta or Non Key Frame Rate

0.3

0.3

0.6

0.9

1.8

Compression Ratio

0.376 254

0.528 763

0.955833

1.433779

1.917706


 

5. DISCUSSION:

The change in frame rates of key frame and non key frame maintaining Rt2 = 0.3 Rt1, the compression ratio increases up to approximate 2 with increase in frame rates. The reconstructed video is identical with the original video but the streaming of video is little bit increased and the quality is slightly blurred.

It is used in video conferencing, video denoising.

 

6. REFERENCES:

1.       Hosam Khalil, Student Member, IEEE, Amir F. Atiya, Senior Member, IEEE, and Samir Shaheen, Member, IEEE “Three-Dimensional Video CompressionUsing Subband/Wavelet Transform with Lower Buffering Requirements” ieee transactions on image processing, Vol. 8, no. 6, June 1999

2.       Othman O. Khalifa, Sering Habib Harding and Aisha-Hassan A. Hashim,Compression using Wavelet Transform ,Signal Processing: An International Journal, Volume(2):Issue (5) pp17-26

3.       Orkun Alatas, Omar Javed and Mubarak Shah, Video Compression Using Structural Flow.

4.       S.Jeyakumar 1 and S. Sundaravadivelu, A Wavelet based Lossless Video Compression using Adaptive Prediction and Motion Approximation ICGST-GVIP Journal, Volume 9, Issue 4, August 2009.

5.       A. Secker and D.Taubman, Motion-compensated highly scalable video compression using an adaptative 3d wavelet transform based on lifting," IEEE , 2001.

6.       L.Luo, J.Li, S.Li, Z.Zhuang, and Y-Q.Zhang, Motion-compensated lifting wavelet and its application in video coding," in IEEE International Conference on Multimedia and xpo, August 2001.

7.       E Gernot Ziegler, Hendrik P.A. Lensch, Marcus Magnor and Hans-Peter Seidel Multi-Video Compression in Texture Space using 4D SPIHT.

8.       Sungdae Choa and William A. Pearlman Embedded Video Compression with Error Resilience and Error Concealment, National Science Foundation under Grant No. EEC-9812706 EEC-9812706, and IBM T. J. Watson Research Center under an IBM Faculty Award.2006. Fast Color-Embedded Video Codingwith SPIHT

9.       Beong-Jo Kim and William A. Pearlman Fast Color-Embedded Video Coding with SPIHT, 1998.

10.     Andrew Secker and David Taubman Highly Scalable Video Compression Using A Lifting Based 3d Wavelet Transform With Deformable Mesh Motion Compensation The University Of New South Wales 2000,” IEEE Trans. Signal Processing, vol. 41, pp. 3445-3462, Dec. 1993.

11.     Mohmmed Raad and Alfred Mertin ,From lossy to lossless audio compression using SPIHT, Proc. 5th International conf.on digital audio effects(DAFx-02),Hamburg, Germany,Sempter, 26-28, 2002.

 

 

 

 

Received on 10.06.2011        Accepted on 22.07.2011        

©A&V Publications all right reserved

Research J. Engineering and Tech. 2(3): July-Sept. 2011 page 123-127